A Localness-Filter for Searched Web Pages

نویسندگان

  • Qiang Ma
  • Chiyako Matsumoto
  • Katsumi Tanaka
چکیده

With the spreading of the Internet, information about our daily life and our residential region is becoming to be more and more active on the WWW (World Wide Web). That’s to say, there are a lot of Web pages, whose content is ’local’ and may only interest residents of a narrow region. The conventional information retrieval systems and search engines, such as Google[1], Yahoo[2], etc., are very useful to help users finding interesting information. However, it’s not yet easy to find or exclude ’local’ information about our daily life and residential region. In this paper, we propose a localness-filter for searched Web pages, which can discover and exclude information about our daily life and residential region from the searched Web pages. We compute the localness degree of a Web page by 1) estimating its region dependence: the frequency of geographical words and the content coverage of this Web page, and 2) estimating the ubiquitousness of its topic: in other words, we estimate if it is usual information that appears everyday and everywhere in our daily life.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Localness Degree of Web Pages and Its Applications from Page Content and Location Information

The vast amount of information is available on the WWW(World Wide Web). Usually, users use the information filtering technologies or search engines to acquire their favorite information. However, it’s still not easy to acquire or exclude local information with the conventional search engines and information filtering technologies. In this paper, we propose a new notion localness to discover loc...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003